Example: Evidence store using Blockchain technologies

Our proposal has two Blockchain systems running in parallel. One of them is in charge of storing the data generated by the defined indicators (Event Blockchain). Hence, this Blockchain is connected to the main components of the Big Data ecosystem: the Big Data Application Provider (BDAP) and the Big Data Framework Provider (BDFP). This first Blockchain system should store all the data related to the actions performed on the data, i.e., the typical Big Data services: collection, preparation, analysis, visualization and access control to the data. Each block of this system should have a similar structure: ID, user ID (the user that performed the operation), role of the user, timestamp, type of operation performed, and the data affected by the operation. Each operation has its own characteristics that must be taken into account, however; for example, the collection can also store the information about the data source from which the data is being stored in the Big Data, or the visualization should consider that it is possible to infer sensitive information from anonymized data by performing a large number of queries on the same dataset. Figure 1 shows a UML diagram with a possible implementation of the Event Blockchain, displaying how it is connected to the Big Data ecosystem.

In parallel, a second Blockchain system stores all the data related to the incidents already identified (Incident Blockchain). In order to do that, this Blockchain is connected to a component that monitors the compliance of the requirements of the Big Data ecosystem. This means that when an incident occurs it is detected and stored in the Incident Blockchain. However, it is necessary to store more incident data that could help in the recovery of the system. To that end, when an incident is identified, all event data that may be related to it will be copied to this second system by using a proxy between both Blockchain systems. Once all the data are collected in the Blockchain systems, it is time to carry out the analysis of them. This implies the need to implement an intelligent system that utilizes Machine Learning techniques to obtain value from these data, to identify the reasons why the incident happened, for example, or to try to predict the occurrence of a new incident by analyzing the events that happen in real time. There are established processes that can help to conduct that task, such as Veeramachaneni's et al. work, which proposes a series of steps to teach a Big Data system to detect attacks: first, an unsupervised learning algorithm is performed to detect outliers that are then analyzed by cybersecurity experts; the result of this analysis will be used as a set of data to train the first unsupervised learning algorithm, so that different iterations are executed in the effort to improve prediction. Moreover, all this data should be visualized by means of a dashboard that is checked by the Incident Response Team. Figure 2 depicts the interaction between the Big Data ecosystem and the Blockchain systems. This figure shows a summarized version of the different components of the SRA for Big Data ecosystems. As explained previously, this activity is not carried out sequentially, but rather in parallel to the rest of the activities of this phase.